# Open-vocabulary Recognition
Llmdet Swin Large Hf
Apache-2.0
LLMDet is a powerful open-vocabulary object detector supervised by large language models, a highlight paper at CVPR2025
Object Detection
L
fushh7
3,428
1
Llmdet Swin Base Hf
Apache-2.0
LLMDet is an open-vocabulary object detector supervised by large language models, capable of zero-shot object detection.
Object Detection
Safetensors
L
fushh7
605
0
Yoloe V8l Seg
YOLOE is a real-time visual omni-model that combines object detection and visual understanding capabilities, suitable for various visual tasks.
Object Detection
Y
jameslahm
4,135
1
Genmedclip
MIT
GenMedClip is a zero-shot image classification model based on the open_clip library, specializing in medical image analysis.
Image Classification
G
wisdomik
40
0
Eva02 Large Patch14 Clip 336.merged2b
MIT
EVA02 CLIP is a large-scale vision-language model based on the CLIP architecture, supporting tasks such as zero-shot image classification.
Text-to-Image
E
timm
197
0
Omdet Turbo Swin Tiny Hf
Apache-2.0
OmDet-Turbo is an efficient fusion-head open-vocabulary detection model based on real-time Transformer, suitable for zero-shot object detection tasks.
Object Detection
O
omlab
36.29k
33
Resnet50x64 Clip.openai
MIT
CLIP model based on the ResNet50x64 architecture from the OpenCLIP library, supporting zero-shot image classification tasks.
Image Classification
R
timm
622
0
Owlv2 Base Patch16 Ensemble
Apache-2.0
OWLv2 is a zero-shot text-conditioned object detection model that can locate objects in images through text queries.
Object Detection
Transformers

O
upfeatmediainc
15
0
Owlv2 Large Patch14 Ensemble
Apache-2.0
OWLv2 is a zero-shot text-conditioned object detection model that can locate objects in images through text queries.
Text-to-Image
Transformers

O
google
262.77k
25
CLIP ViT B 32 CommonPool.S.basic S13m B4k
MIT
A vision-language model based on the CLIP architecture, supporting zero-shot image classification tasks
Image-to-Text
C
laion
53
0
Owlvit Large Patch14
Apache-2.0
OWL-ViT is a zero-shot text-conditioned object detection model that can retrieve objects in images through text queries.
Text-to-Image
Transformers

O
google
25.01k
25
Owlvit Base Patch16
Apache-2.0
OWL-ViT is a zero-shot text-conditioned object detection model that can detect objects in images via text queries.
Text-to-Image
Transformers

O
google
4,588
12
Featured Recommended AI Models